IMAGE WATERMARKING IN DCT DOMAIN HAS A HIGH COMPUTATIONAL COMPLEXITY ESPECIALLY FOR COLOR AND HIGH RESOLUTION IMAGES, WHERE USAGE OF THEM HAS BEEN SIGNIFICANTLY GROWN. TO ADDRESS THIS ISSUE, IN THIS ARTICLE, A DATA-Parallel COLOR DCT WATERMARKING APPROACH IS PROPOSED AND IMPLEMENTED ON GPU USING (CUDA). ALSO, IN THIS WORK, BEFORE EMBEDDING, THE COLOR WATERMARK IS COMPRESSED USING A MODIFIED METHOD TO GET LESS DISTORTION. (CUDA) IMPLEMENTATION OF 8×8 DCT OFFERS 12X-43X SPEEDUP with GT 540M AND 94X-105X SPEEDUP with GTX 580, FOR DIFFERENT IMAGE SIZES. IN CASE OF EMBEDDING PROCEDURE, THE SPEEDUP OBTAINED BY GT 540M IS BETWEEN 7X AND 26X, AND THE SPEEDUP OBTAINED BY GTX 580 IS BETWEEN 46X AND 73X, FOR VARIOUS CASE STUDIES. FURTHERMORE, IN CASE OF EXTRACTING PROCEDURE, GT 540M LEADS TO A SPEEDUP BETWEEN 10X AND 29X, AND GTX 580 LEADS TO A SPEEDUP BETWEEN 75X AND 80X, FOR VARIOUS CASE STUDIES.